thompson sampling
- Europe > France (0.04)
- North America > United States (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors
Roy, Somjit, Jaiswal, Prateek, Bhattacharya, Anirban, Pati, Debdeep, Mallick, Bani K.
We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > Canada (0.04)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.97)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > Germany (0.05)
- South America > Paraguay (0.05)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Bandit Allocational Instability
When multi-armed bandit (MAB) algorithms allocate pulls among competing arms, the resulting allocation can exhibit huge variation. This is particularly harmful in modern applications such as learning-enhanced platform operations and post-bandit statistical inference. Thus motivated, we introduce a new performance metric of MAB algorithms termed allocation variability, which is the largest (over arms) standard deviation of an arm's number of pulls. We establish a fundamental trade-off between allocation variability and regret, the canonical performance metric of reward maximization. In particular, for any algorithm, the worst-case regret $R_T$ and worst-case allocation variability $S_T$ must satisfy $R_T \cdot S_T=Ω(T^{\frac{3}{2}})$ as $T\rightarrow\infty$, as long as $R_T=o(T)$. This indicates that any minimax regret-optimal algorithm must incur worst-case allocation variability $Θ(T)$, the largest possible scale; while any algorithm with sublinear worst-case regret must necessarily incur ${S}_T= ω(\sqrt{T})$. We further show that this lower bound is essentially tight, and that any point on the Pareto frontier $R_T \cdot S_T=\tildeΘ(T^{3/2})$ can be achieved by a simple tunable algorithm UCB-f, a generalization of the classic UCB1. Finally, we discuss implications for platform operations and for statistical inference, when bandit algorithms are used. As a byproduct of our result, we resolve an open question of Praharaj and Khamaru (2025).
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
BFTS: Thompson Sampling with Bayesian Additive Regression Trees
Deng, Ruizhe, Chakraborty, Bibhas, Chen, Ran, Tan, Yan Shuo
Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.84)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California (0.04)
- Information Technology > Data Science (0.46)
- Information Technology > Artificial Intelligence (0.46)